Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors

نویسندگان

چکیده

Conditional t-SNE (ct-SNE) is a recent extension to that allows removal of known cluster information from the embedding, obtain visualization revealing structure beyond label information. This useful, for example, when one wants factor out unwanted differences between set classes. We show ct-SNE fails in many realistic settings, namely if data well clustered over labels original high-dimensional space. introduce revised method by conditioning similarities instead low-dimensional and storing within- across-label nearest neighbors separately. also enables use recently proposed speedups t-SNE, improving scalability. From experiments on synthetic data, we find our resolves considered problems improves embedding quality. On real containing batch effects, expected improvement not always there. argue preferable overall, given its improved The results highlight new open questions, such as how handle distance variations clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Naive Bayes Image Classification: Beyond Nearest Neighbors

Naive Bayes Nearest Neighbor (NBNN) has been proposed as a powerful, learning-free, non-parametric approach for object classification. Its good performance is mainly due to the avoidance of a vector quantization step, and the use of image-to-class comparisons, yielding good generalization. In this paper we study the replacement of the nearest neighbor part with more elaborate and robust (sparse...

متن کامل

Quantifying long-range correlations in complex networks beyond nearest neighbors

We propose a fluctuation analysis to quantify spatial correlations in complex networks. The approach considers the sequences of degrees along shortest paths in the networks and quantifies the fluctuations in analogy to time series. In this work, the Barabasi-Albert (BA) model, the Cayley tree at the percolation transition, a fractal network model, and examples of real-world networks are studied...

متن کامل

Nearest-neighbors medians clustering

We propose a nonparametric cluster algorithm based on local medians. Each observation is substituted by its local median and this new observation moves toward the peaks and away from the valleys of the distribution. The process is repeated until each observation converges to a fixpoint. We obtain a partition of the sample based on the convergence points. Our algorithm determines the number of c...

متن کامل

Boruvka Meets Nearest Neighbors

Computing the minimum spanning tree (MST) is a common task in the pattern recognition and the computer vision fields. However, little work has been done on efficient general methods for solving the problem on large datasets where graphs are complete and edge weights are given implicitly by a distance between vertex attributes. In this work we propose a generic algorithm that extends the classic...

متن کامل

Iterative Nearest Neighbors

Representing data as a linear combination of a set of selected known samples is of interest for various machine learning applications such as dimensionality reduction or classification. k-Nearest Neighbors (kNN) and its variants are still among the best-known and most often used techniques. Some popular richer representations are Sparse Representation (SR) based on solving an l1-regularized lea...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-30047-9_14